Learning Prosodic Focus from Continuous Speech Input: A Neural Network Exploration
نویسندگان
چکیده
This study uses connectionist modeling to explore whether and how infants might learn prosodic focus directly from continuous speech input. Focus is a communicative function that serves to put emphasis on a particular part of an utterance, and it is mainly encoded by pitch variations. The acquisition of focus entails two major difficulties. The first is that focus-related pitch patterns are confounded by other linguistic functions that also use pitch for their encoding, such as lexical tone in a tone language. Second, speakers have different pitch ranges, which further confounds the focus related pitch patterns. In three simulations using self-organizing neural networks, we explored how focus may be learned from continuous acoustic signals in Mandarin that were produced with cooccurring lexical tones and by multiple speakers. We used sentence-sized F0 contours as well as their velocity profiles (D1) as training input. Results show that both F0 and D1 contours provide information for focus learning, but only the D1-trained network adequately handled the variability introduced by cross-gender differences. The recognition rate was analogous to human performance. Implications of these findings for theories of language acquisition and adult speech perception are discussed.
منابع مشابه
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
Prosodic structure generation from text plays an important role in Chinese text-to-speech (TTS) synthesis, which greatly influences the naturalness and intelligibility of the synthesized speech. This paper proposes a multi-task learning method for prosodic structure generation using bidirectional long shortterm memory (BLSTM) recurrent neural network (RNN) and structured output layer (SOL). Unl...
متن کاملA Neural Learning Approach for Duration Parameter Generation in Mandarin Speech Synthesis
In this paper, a neural learning approach is investigated, which is designed to generate duration parameter for mandarin speech synthesis. Unlike traditionally used rule-based methods, the novelty of this method lies in that it combines neural learning strategy and prior linguistic knowledge to obtain duration parameter. Rules generalized by linguists are used to encode input vectors of the neu...
متن کاملSound Pattern Matching for Automatic Prosodic Event Detection
Prosody in speech is manifested by variations of loudness, exaggeration of pitch, and specific phonetic variations of prosodic segments. For example, in the stressed and unstressed syllables, there are differences in place or manner of articulation, vowels in unstressed syllables may have a more central articulation, and vowel reduction may occur when a vowel changes from a stressed to an unstr...
متن کاملDouble-Star Detection Using Convolutional Neural Network in Atmospheric Turbulence
In this paper, we investigate the usage of machine learning in the detection and recognition of double stars. To do this, numerous images including one star and double stars are simulated. Then, 100 terms of Zernike expansion with random coefficients are considered as aberrations to impose on the aforementioned images. Also, a telescope with a specific aperture is simulated. In this work, two k...
متن کاملمعرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی
In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...
متن کامل